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Customer churn prediction recently is one of the vital issues that confronts 
diverse business industries to sustain the customers base and profits. On the 
other hand, data scientists employ gigantic customer data to automate the data 
modelling process to offer these models as a generally portable service. This 
research has two main contributions: deep learning customer churn prediction 
model and smart evaluation prediction model service. So, this service 
harnesses any customer data to automate building, evaluation, and 
deployment of the churn prediction model. The research consists of three main 
parts. Firstly, it illustrates the dataset labelling which annotates customers data 
into churn or non-churn. Secondly, the deep learning churn prediction 
framework using convolutional neural network (CNN) algorithm. Finally, a 
case study is presented to show how churn prediction service is automatically 
trained and generated based on real customer data, where CNN parameters are 
adapted to achieve the most reliable performance in line with customers' 
behavior. The applied case study achieves accuracy 0.77, area under the curve 


(AUC) 0.84 and f1 score 0.83. 
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1. INTRODUCTION 

Machine learning models were used for predicting in different fields [1]-[4]. Nowadays, diverse 
business markets reach a congestion state and face a brutal competition between different service providers. 
This competition arises due to the market saturation of abundant service providers and the products’ offers 
diversity. Herein, churn prediction is a business use case, which applies various data mining techniques to 
detect the customers who are likely to cancel their subscription to a special service [5], [6]. Customer behavior 
changes in line with the defined business use case. Customizing each prediction model costs redundant time 
and effort for each case [7]. Herein, data scientists automate the data modelling process, so it generalizes the 
modelling process and offers it as a service [8]. This research comes as part of implementing a customer 
relationship management (CRM) system called customer loyalty intelligent personalization (CLIP). CLIP is a 
smart, machine learning based personalized customer advisory system. CLIP can serve different kinds of 
business applications. It aims to assist E-commerce and retail businesses to retain their profits and their 
customer base. This paper proposes a framework for automating customer churn prediction with respect to the 
business use case. The paper sections are: section 2 outlined a literature review, section 3 illustrated the 
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customer labelling dataset, section 4 showed an overview on the implemented framework, and section 5 
showed a case study on client’s real data. 


2. BACKGROUND 

Churn prediction has been studied in various researches, with different aims. Authors focused on 
comparing different techniques and approaches in particular domains. This section outlined some of the recent 
ongoing researches, as following: Research by Castanedo et al. [9], performed churn prediction by leveraging 
deep learning architectures image classification. Firstly, supervised learning was performed on over 6 million 
customers using deep convolutional neural networks (CNNs), which achieved an area under the curve (AUC) 
of 0.778 on the test dataset. The study's main weakness is the extremely scarce user-user input data, which 
grows significantly when long-term user interactions are taken into account. Therefore, an input data 
architecture may encode these long-term connections among users in order to better predict long-term 
interactions in a telecommunication dataset 

Another research by Wangperawong et al. [10], in order to do churn prediction, customer temporal 
behavioral data was represented as images using deep learning architectures popular in image classification. 
Deep CNNs were used for supervised learning on labelled data from over 6 million users, and they produced 
an AUC of 0.743. However, no more than 12 temporal features were employed for each customer. To increase 
the effectiveness of the input photographs, more features can be added. 

Research by Ismail et al. [11], in one of the top telecommunications firms in Malaysia, a multilayer 
perceptron (MLP) neural network approach has been presented to forecast customer turnover. Its outcome 
shown that in prediction tasks, neural networks were superior to statistical models (91.28% prediction 
accuracy). There were only 78 churners and 58 nonchurners in the training set, but there were 13 churners and 
10 nonchurners in the testing set. This data set is incredibly little, and it cannot possibly be used to forecast 
churn in the telecom sector. Additionally, the dataset utilised is private, thus performance comparison is not 
possible. 

Research by Tariq et al. [12], the suggested model employs a 2-D (CNN; a technique of deep 
learning). The suggested model features a layered design with two distinct phases: a layer for data import and 
preprocessing, and another layer for 2-D CNN. A parallel environment is also employed to process the data 
using the Apache Spark distributed and parallel framework. Telco customer churn is used to extract Kaggle 
training data. An accuracy rating of 0.963 out of 1 was assigned to the suggested model. Additionally, there is 
a very small loss during training and validation (0.004). The true-positive and true-negative values are 95% 
and 94%, respectively, according to the results of the confusion matrices. They simply reported the 
performance data; they did not compare their models with any other models; however, the false-negative is 
only 5% and the false-positive is only 6%, which is effective. 

Cenggoro et al. [13] employed a vector embedding model to estimate loss for a telecom dataset of 
3,333 users without contrasting the suggested model with any other models. The model's accuracy and F1- 
score were the only metrics provided, and they show that the model does a good job of differentiating between 
churning and non-churning clients. Zhou et al. [14] proposed a model based on long short term memory 
network (LSTM) and CNN which has cross-layer connections between the LSTM layers and the convolution 
layers. This model learns the latent sequential information and captures important local features from time 
series features. Experimental results on the real-life dataset showed that the proposed model performed better 
than other comparison models. 

Zhong and Li [15] proposed the CNN-based predictive model to detect churn signals from transcript 
data of phone calls. Experimental results showed that when sufficient training data was provided with our text 
annotation method, their CNN-based predictive model generated state-of-the-art performance in churn 
prediction. Finally, Pirmohammadi and Mast [5] proposed multi-layer perceptron ANN with 8 neurons in a 
hidden layer has applied and the best performance of this network appears in epoch 10. Then, the structural 
model of the network was added. On the other hand, a Regression test has been used in order to predict customer 
churn by SPSS. The best performance of ANN occurred in epoch 10. The statistical performance of ANN 
model, in order to classify output and target value, SME and RSME are computed which are equal to 0.15599 
and 0.39495 respectively. 

This research proposed an automated solution for the customer churn prediction model. It applied a 
deep learning algorithm to predict future customers’ churn rates based on real client data. The proposed solution 
applied a CNN algorithm to build the churn prediction model. It automatically labelled the customer data, then 
divided it into training, validation, and test sets. Following that, the CNN parameters are adapted to achieve 
the best prediction model based on customer behavior with respect to the given business case. The implemented 
software is portable and customizable for generic e-commerce and retail business cases. 
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First essential step in churn prediction is to assess each customer's behavior. The customers’ data isn’t 
labelled or classified before as churned or not, however it can be inferred from their previous purchase 
transactions. This research proposed a methodology that illustrates how to infer customers’ behavior if it is a 
possible churn or not. The main churn configurable are calculated from the real customer data, which are: 
purchase frequency bypass times, reduction average purchase percentage, and average reduction purchase 
times. Figure 1 shows the pseudo-code to annotate customer's data to churn or not churn. Firstly, it fetches 
customer’s first and last visit dates. Secondly, it calculates the number of customer purchases times and their 
average purchase values. Thirdly, it starts labeling customers churned or not based on the above obtained 
values. After that, data is divided into training, validation, and testing sets for further processes. 


LABELING CUSTOMER DATASET 
Customers Labeling - Pseudo Code 
. Fetch client last purchase date 
. Fetch customer first visit date (first purchase date) 


3. Fetch customer last visit date (last purchase date) 


- Calculate purchase frequency 
Days of (last purchase date — first purchase date) / customer’ s purchases count 


. Calculate average purchase value 
Customer's total Purchases / customer's purchases count 


. Determine customer's number of purchases 
If num ber of purchasesis 1 THEN label customer as NOT APPLICABLE 


7. Determine chum by frequency bypass times 
If (dient last purchase date — customer last purchase date) / customer purchase frequency >= 
frequency bypass tim es THEN label customer as CHURNED 


8. Determine chum by purchase reduces for last n times (n reduces tim es from configurable section) 
nFrequencies = Get custom er purchase for last n frequencies 
Foreach frequency in nFrequencies 
Calculate percentage between frequency purchases and average purchases 
If all percentage calculation <= reduce percentage 
THEN label customer as CHURNED 


9. Otherwise. label customer as NOT CHURNED 


Figure 1. Customer labelling pseudocode 


3. CUSTOMER CHURN PREDICTION FRAMEWORK 

The CNN machine learning algorithm is one of the most famous deep learning algorithms, whose 
main power is feature engineering without need for domain expertise [5]-[16]. In this research, the CNN 
algorithm is applied to build the customer churn prediction model. The CNN hyperparameters such as weight 
constraint, dropout rate, filter numbers, dense neuron number, Kernel size, batch size, and momentum, are 
initialized randomly. Then, these hyperparameters are repeatedly changed to fit the built CNN model on the 
customer data. The output model is considered a customized churn prediction model, which can be deployed 
in a specific business case. 

Figures 2 and 3 show the workflow to transform automatically customer raw data into a customized 
churn prediction model. The input data represents the customer behavior, which includes the feature columns 
and the actual label if the customer is considered as a churner or non-churner. The output of that workflow is 
a prediction model. Figure 2 shows the first basic processes to prepare data before data modelling, which are 
preprocessing, feature engineering, and data splitting. The input in Figure 2 is the raw customer data and the 
output will be three main organized datasets: training, testing, and validation datasets. The training data size is 
adjusted to be 60% of full data and 20% is for both validation and testing to report the model performance. 
Figure 3 shows the details of data auto-modelling processes that work on getting the best fitted prediction 
model with respect to the provided customer data. The main datasets which are previously formed, are the input 
for the main three auto-data modelling processes which are training, validating, and testing processes. In the 
auto-training process, the CNN hyperparameters form a list of combinations. These lists are automatically 
applied to build and train the CNN algorithm to generate different prediction models. In the auto-validating 
process, the validation dataset evaluates each generated CNN model in the training process; the highly accurate 
model is saved. In the auto-testing process, the testing dataset evaluates the accepted CNN model and reports 
the evaluation metrics and saves for further predicting unseen customer data. 


Smart evaluation for deep learning model: churn prediction as a product case ... (Esam Mohamed Elgohary) 


1222 0O ISSN: 2302-9285 


Train Data 
Input Data Feature Extraction Data Validate 
Data preparation & Engineering Splitting Data 
Test 
Data 


Figure 2. Churn prediction model framework 
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Figure 3. Data modelling processes 


A case study presents a sample of real customer data which is used to build a customized churn 
prediction model. This section’s purpose is to show how the prediction model is auto-trained and auto- 
evaluated to achieve the most recommended model. The four main evaluation numbers to evaluate any 
supervised machine learning algorithm are: true positive (TP), true negative (TN), false positive (FP), and false 
negative (FN) [17], and these numbers are considered basics for various evaluation equations like accuracy 
and F1 score as shown in (1) and (2) respectively. In the churn prediction case, TP represents the number of 
truly predicted customers as a churner. TN shows the number of truly predicted customers as non-churners. FP 
displays the number of customers who are non-churners but the predictive algorithm has labelled them as 
chumers. FN represents the number of customers who are churners but the predictive model has labelled them 
as non-churners. 

TP +TN 


Accuracy = —————_ (1) 


TP EEN TEP EEN 


ppan T (2) 


TP +> (FP + FN) 


Any CNN has a collection of hyperparameters as aforementioned [16]-[23]. Each hyperparameter has 
a preferred value range and impact on the built model performance. Herein, these hyperparameters could form 
different lists of combinations of various values. Each combination is applied to build, train and evaluate the 
CNN using the abovementioned equations. In this case study, the applied data size is 476, 119, and 149 for 
training, validation and testing respectively. Table 1 shows 20 out of 768 lists of combinations of CNN 
hyperparameters to view their impact on both model training and validation performance. The successful 
accepted model attained accuracy 0.78 in training and 0.77 in testing, and attained f1 score 0.85 in training and 
0.83 in testing. 

On the other hand, receiver operating characteristics (ROC) and area under the ROC (AUC) 
[24], [25] are other evaluation measurements which evaluate the model performance based on TP and FP rates. 
Figure 4 shows two evaluation graphs for the successful fit model generated in this case study. The right graph 
displays the ROC and AUC curve which shows highly reliability 0.84 in predicting unseen data, and the graph 
on the left F1 score for both training and testing. 
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Table 1. CNN hyperparameters experiments 
Weight Learn M t Dropout CNN Kernel Dense Batch Acc F1 Acc. F1 
constraint rate Genii rate filters size neuron size train train Val. Val. 
4 0.001 0.1 0.4 32 3 64 10 0.78 0.85 0.77 0.85 
4 0.001 0.1 0.4 32 3 128 10 0.77 0.84 0.74 0.83 
4 0.001 0.1 0.4 32 3 256 10 0.75 0.84 0.74 0.84 
4 0.001 0.1 0.4 32 4 64 10 0.77 0.84 0.78 0.86 
4 0.001 0.1 0.4 32 4 128 10 0.76 0.84 0.73 0.83 
4 0.001 0.1 0.4 32 4 256 10 0.78 0.86 0.74 0.84 
4 0.001 0.1 0.4 64 3 64 10 0.79 0.85 0.78 0.87 
4 0.001 0.1 0.4 64 3 128 10 0.80 0.86 0.78 0.87 
4 0.001 0.1 0.4 64 3 256 10 0.75 0.84 0.73 0.84 
4 0.001 0.1 0.4 64 4 64 10 0.78 0.85 0.81 0.88 
4 0.001 0.1 0.4 64 4 128 10 0.75 0.84 0.73 0.83 
4 0.001 0.1 0.4 64 4 256 10 0.75 0.84 0.74 0.84 
4 0.001 0.1 0.4 128 3 64 10 0.77 0.85 0.77 0.85 
4 0.001 0.1 0.4 128 3 128 10 0.78 0.85 0.78 0.86 
4 0.001 0.1 0.4 128 3 256 10 0.75 0.84 0.74 0.84 
4 0.001 0.1 0.4 128 4 64 10 0.78 0.85 0.79 0.87 
4 0.001 0.1 0.4 128 4 128 10 0.81 0.87 0.79 0.87 
4 0.001 0.1 0.4 128 4 256 10 0.76 0.85 0.74 0.84 
4 0.001 0.1 0.4 256 3 64 10 0.77 0.84 0.77 0.85 
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Figure 4. Case study evaluation graphs 
4. CONCLUSION 


In the thriving technological era, the markets are overloaded with various services providers, which 


escalates competition between companies to preserve their customer bases and financial gains. Churn 
prediction is a problem which has intrigued various researchers and business leaders recently. On the other 
hand, customer data modeling in each business case to generate a churn prediction model consumes too much 
time and effort. So, this research proposed an automated customer churn prediction service using the CNN 
algorithm. It facilitates generation of a deep learning churn prediction model for each business case based on 
their customer behavior. A case study is presented to show the automatic adaptation of CNN hyperparameters 
until a decision made to select the best fit model. This case study shows reliable AUC measurement reached 
0.84. This research can contribute to automatically predicting and evaluating customer churn rates in both e- 
commerce and retail business applications. 
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